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VEGETAL SEQUENCES INCLUDING A POLYMORPHIC 
SITE AND THEIR USES 

The genomes of all organisms undergo spontaneous 
mutation in the course of their continuing evolution 
generating variant forms of progenitor sequences (Gusella, 
Ann, Rev. Biochem. 55, 831-854 (1986)). The variant form may 
confer an evolutionary advantage or disadvantage relative to 
a progenitor form or may be neutral. In some instances, a 
variant form confers a lethal disadvantage and is not 
transmitted to subsequent generations of the organism. In 
other instances, a variant form confers an evolutionary 
advantage to the species and is eventually incorporated into 
the DNA of many or most members of the species and 
effectively becomes the progenitor form. In many instances, 
both progenitor and variant form(s) survive and co-exist in 
a species population. The coexistence of multiple forms of a 
sequence gives rise to polymorphisms. 

Several different types of polymorphism have 
been reported. A restriction fragment length polymorphism 
(RFLP) means a variation in DNA sequence that alters the 
length of a restriction fragment as described in Botstein et 
al., Am. J. Hum. Genet. 32, 314-331 (1980). The restriction 
fragment length polymorphism may create or delete a 
restriction site, thus changing the length of the 
restriction fragment. RFLPs have been widely used in human 
and animal genetic analyses (see WO 90/13668; WO 90/11369; 
Donis-Keller, Cell 51, 319-337 (1987); Lander et al . , 
Genetics 121, 85-99 (1989)). When a heritable trait can be 
linked to a particular RFLP, the presence of the RFLP in an 
individual can be used to predict the likelihood that the 
animal will also exhibit the trait. 

Other polymorphisms take the form of short 
tandem repeats (STRs) that include tandem di-, tri- and 
tetra-nucleotide repeated motifs These tandem repeats are 
also referred to as variable number tandem repeat (VNTR) 
polymorphisms. VNTRs have been used in identity and 
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paternity analysis (US 5,075,217; Armour et al , FEBS Lett. 
307, 113-115 (1992); Horn et al . , WO 91/14003 ; Jeffreys, 
EP 370,719), and in a large number of genetic mapping 
studies . 

Other polymorphisms take the form of single 
nucleotide variations between individuals of the same 
species. Such polymorphisms are far more frequent than 
RFLPs, STRs and VNTRs . Some single nucleotide polymorphisms 
occur in proteincoding sequences, in which case, one of the 
polymorphic forms may give rise to the expression of a 
defective or other variant protein. Other single nucleotide 
polymorphisms occur in noncoding regions. Some of these 
polymorphisms may also result in defective or variant 
protein expression (e.g., as a result of defective 
splicing) . Other single nucleotide polymorphisms have no 
phenotypic effects. Single nucleotide polymorphisms can be 
used in the same manner as RFLPs, and VNTRs but offer 
several advantages. Single nucleotide polymorphisms occur 
with greater frequency and are spaced more uniformly 
throughout the genome than other forms of polymorphism. The 
greater frequency and uniformity of single nucleotide 
polymorphisms means that there is a greater probability that 
such a polymorphism will be found in close proximity to a 
genetic locus of interest than would be the case for other 
polymorphisms. Also, the different forms of characterised 
single nucleotide polymorphisms are often easier to 
distinguish that other types of polymorphism (e.g., by use 
of assays employing allele-specif ic hybridization probes or 
primers) . 

Despite the increased amount of nucleotide 
sequence data being generated in recent years, only a minute 
proportion of the total repository of polymorphisms has so 
far been identified. The paucity of polymorphisms hitherto 
identified is due to the large amount of work required for 
their detection by conventional methods. For example, a 
conventional approach to identifying polymorphisms might be 
to sequence the same stretch of oligonucleotides in a 
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population of individuals by didoxy sequencing. In this type 
of approach, the amount of work increases in proportion to 
both the length of sequence and the number of individuals in 
a population and becomes impractical for large stretches of 
DMA or large numbers of subjects. 

SUMMARY OF THE INVENTION 

The invention provides nucleic acid segments 
containing at least 10, 15 or 20 contiguous bases from a 
vegetal fragment including a polymorphic site notably a 
single nucleotide polymorphism (SNP) . In a particular 
embodiment, a vegetal fragment does not belong to the 
Cruciferae family. 

The segments can be DNA or RNA, and can be 
double- or single-stranded. Some segments are 10-20 or 10-50 
bases long. Preferred segments include a diallelic 
polymorphic site. In a preferred embodiment, the invention 
concerns nucleic acid segments from a fragment shown in 
Table I (corn) . 

The Invention further provides allele-specif ic 
oligonucleotides that hybridizes to a segment of a vegetal 
fragment, for example fragment in Table I. These 
oligonucleotides can be probes or primers. Also provided are 
isolated nucleic acid" comprising a sequence of Table I or 
the complement thereto, in which the polymorphic site within 
the sequence is occupied by a base other than the reference 
base shown in Table I. 

The invention further provides a method of 
analyzing a nucleic acid from a subject. The method 
determines which base or bases is/are present at any one of 
the polymorphic vegetal sites for example of those of Table 
I. Optionally, a set of bases occupying a set of the 
polymorphic sites shown in Table I is determined. This type 
of analysis can be performed on a plurality of subjects who 
are tested for the presence of a phenotype. The presence or 
absence of phenotype can then be correlated with a base or 
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set of bases present at the polymorphic sites in the 
sub j ec t s t es ted . 

DEFINITIONS 

A nucleic acid, such an oligonucleotide, 
oligonucleotide can be DNA or RNA, and single- or 
double-stranded. Oligonucleotides can be naturally occurring 
or synthetic, but are typically prepared by synthetic means. 
Preferred nucleic acids of the invention include segments of 
DNA, or their complements including any one of the 
polymorphic sites shown in Table I. The segments are usually 
between 5 and 100 bases, and often between 5-10, 5-20, 
10-20, 10-50, 20-50 or 20-100 bases. The polymorphic site 
can occur within any position of the segment. The segments 
can be from any of the allelic forms of DNA shown in Table 
I. Methods of synthesizing oligonucleotides are found in, 
for example, Oligonucleotide Synthesis; A Practical ApproacA 
(Gait, ed., IRL Press, Oxford, 1984). 

Hybridization probes are oligonucleotides 
capable of binding in a base-specific manner to a 
complementary strand of nucleic acid. Such probes include 
peptide nucleic acids, as described in Nielsen et al . , 
Science 254, 1497-1500 (1991), 

The term primer refers to a single-stranded 
oligonucleotide capable of acting as a point of initiation 
of template-directed DNA synthesis under appropriate 
conditions (i.e., in the presence of four different 
nucleoside triphosphates and an agent for polymerization, 
such as, DNA or RNA polymerase or reverse transcriptase) in 
an appropriate buffer and at a suitable temperature. The 
appropriate length of a primer depends on the intended use 
of the primer but typically ranges from 15 to 30 
nucleotides. Short primer molecules generally require cooler 
temperatures to form sufficiently stable hybrid complexes 
with the template A primer need not reflect the exact 
sequence of the template but must be sufficiently 
complementary to hybridize with a template. The term primer 
site refers to the area of the target DNA to which a primer 
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hybridizes . The term primer pair means a set of primers 
including a 5 ' upstream primer that hybridizes with the 5 * 
end of the DNA sequence to be amplified and a 3 • , downstream 
primer that hybridizes with the complement of the 3 ' end of 
the sequence to be amplified. 

Linkage describes the tendency of genes, 
alleles, loci or genetic markers to be inherited together as 
a result of their location on the same chromosome, and can 
be measured by percent recombination between the two genes, 
alleles, loci or genetic markers. 

Polymorphism refers to the occurrence of two or 
more genetically determined alternative sequences or alleles 
in a population. A polymorphic marker or site is the locus 
at which divergence occurs. Preferred markers have at least 
two alleles, each occurring at frequency of greater than 1%, 
and more preferably greater than 10% or 20% of a selected 
population. A polymorphic locus may be a" small as one base 
pair. Polymorphic markers include restriction fragment 
length polymorphisms, variable number of tandem repeats 
(VNTR's), hypervariable regions, minisatellites , 
dinucleotide repeats, trinucleotide repeats, tetranucleotide 
repeats, simple sequence repeats, and insertion elements 
such as Alu. The first identified allelic form is 
arbitrarily designated as a the reference form and other 
allelic forms are designated as alternative or variant 
alleles. The allelic form occurring most frequently in a 
selected population is sometimes referred to as the wildtype 
form. Diploid organisms may be homozygous or heterozygous 
for allelic forms. A diallelic polymorphism has two forms. A 
triallelic polymorphism has three forms. 

A single nucleotide polymorphism occurs at a 
polymorphic site occupied by a single nucleotide, which is 
the site of variation between allelic sequences. The site is 
usually preceded by and followed by highly conserved 
sequences of the allele (e.g., sequences that vary in 1QSS 
than 1/100 or 1/1000 members of the populations) . 
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A single nucleotide polymorphism usually arises 
due to substitution of one nucleotide for another at the 
polymorphic site. A transition is the replacement of one 
purine by another purine or one pyrimidine by another 
5 pyrimidine. A transversion is the replacement of a purine by 

a pyrimidine or vice versa. Single nucleotide polymorphisms 
can also arise from a deletion of a nucleotide or an 
insertion of a nucleotide relative to a reference allele. 

Hybridizations are usually performed under 

10 stringent conditions, for example, at a salt concentration 

of no more than 1 M and a temperature of at least 25°C For 
example, conditions of 5X SSPE (750 mM NaCl, 50 mM 
NaPhosphate, 5 mM EDTA, pH 7.4) and a temperature of 25-30°C 
are suitable for allele-specif ic probe hybridizations. 

15 Nucleic acids of the invention are often in 

isolated form. An isolated nucleic acid means an object 
species that is the predominant species present (i.B., on a 
molar basis it is more abundant than any other individual 
species in the composition) . Preferably, an isolated nucleic 

20 acid comprises at least about 50, 80 or 90 percent (on a 

molar basis) of all macromolecular species present. Most 
preferably, the object species is purified to essential 
homogeneity (contaminant specie" cannot be detected in the 
composition by conventional detection methods) . 

25 

DESCRIPTION OF THE PRESEN T INVENTION 
I. Novel Polymorphisms of the Invention 
The present application provides for example 
oligonucleotides containing polymorphic sequences isolated 
30 from graminae species for example maize. The invention also 

includes various methods for using those novel 
oligonucleotides to identify, distinguish, and determine the 
relatedness of individual strains or pools of nucleic acids 
from plants . 

35 



EXAMPLES 

Example 1. Maize DNA extraction 
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DNA was extracted from maize lines as described 
in Rogers and Bendich (198 8 Plant Mol Biol Manual A6 : 1- 
10) with modification described in Murigneux et al (1993 
theo Appl Genet 86 : 837-842). 

PCR amplification was done on six maize lines 
representing a wide range of genetic variability and 
including both european flint material and US dent 
germplasm. Those six maize lines have been choosen to 
maximize the genetic variability of cultivated maize. Doing 
so, optimize the chance of finding polymorphism in the 
allelic sequences. For example Gl, an european flint line 
and G3 , an US Corn Belt Stiff Stalk line, are completly 
unrelated. Their genetic distance (coefficient of 
dissimilarity) calculated with our standard approach (89 
RFLP probe/enzyme combinations and Nei-li distance) is 0.69. 
This value is close to the maximum distance between two 
cultivated maize lines. 

Among the 15 genetic distance between couple of 
these 6 lines : 8 are superior to 0.6, 6 superior to 0.5 and 
only one inferior to 0.5. This shows that the choice of the 
lines avoided as much as it was possible the potential 
redudancy (or similarity) of allele at the locus sequenced. 
With the same effort of sequencing we should therefore have 
collected the maximum number of polyphomism. 

Serotypes ; 

Gl=flint line 
G2= flint line 
G3=Dent line 
G4=Dent line 
G5=Dent line 
G6=Dent line 

Example 2. Choice of the markers 

The markers have been chosen with the following 

criteria. 

1. Selection of markers that give a single band 
in southern hybridization. This is to avoid as much as 
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possible the problems of duplicated sequences (very frequent 
in plants) . If the same (or nearly the same) sequence occurs 
at several position in the genome (locus 1 and 2) and if the 
primers used to type the SNP found on locus 1 allow 
amplif ication of the sequence at the locus 2, the results of 
hybridization on the chips will be the addition of two 
markers pattern and therefore impossible to use. 

2. Distribution on the genome : most of the 
genetic analysis in plant aim to characterize the whole 
genome (genetic variability evaluation, mapping quantitative 
trait-locus, back-cross assisted selection) . The second 
criteria was therefore to choose markers nicely distributed 
over the 10 chromosomes (see Table A hereunder for map 
position) . 

3. Selection of gene coding for enzymes involved 
in the Carbone metabolism. Wxl, Ael, Sh2, Brel, Btl, Ssu, 
Bt2 are involved in sugar-starch metabolism. Such a choice 
will allow to have a very fast characterization of the 
allelic variability (possibly linked to efficiency) of gene 
involved in this metabolism. 

The following markers have been used : see 

Table A. 

LEGEND OF TABLE A 
Probe = name of the marker 
COD = in-house code. 

MAP Pos = map position, given by the bin location of the 
University of Missouri map (Maize Genetic Newsletter n°69 
1995). Examples of reading the "MAP Pos" and "Prim" columns 
: 1.01-1.02 means that it is the core probe that delimit the 
bins 1 and 2 on chromosome 1 

5.01 means that it is located in the bin 5.01 (on chromosome 
5) 

4 means that it is located on chromosome 4 
SO IF is the forward primer for probe 1 
S01R is the reverse primer for probe 1 
Genbank/ EMBL = Genbank/ EMBL number 
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TABLE A 





Csnpld 


(33 


markers ) 








5 


PROBE 


COD Map Pos 


PRIM 


SEQUENCES OLIGOS Genbank/EM 




UMC157 


SOI 


1.01-1.02 


S01F 


CGCACGCACATTAGCTTTCG 


G10822 










S01R 


TGCAACCGAACAGGATCTGC 


G10823 




UMC7 6 


S02 


1.02-1.03 


S02F 


ATTATTCGGCGTCCAGCCCC 


G10865 


10 








S02R 


TTAC C AGCGGTGAG AGCTGC 


G10866 




UMC67 


S03 


1.05-1.06 


S03F 


CGTTCGTGTGGCATCAATCG 


G10B64 










S03R 


CGACATCATCATCGGCAACC 


G13173 




UMC161 


S06 


1.10-1.11 


S06F 


CAGACCTTGGTTGGAGGCAAC 


G10824 










S06R 


TCGCTCCCCTTCTTCCTTCC 


G10925 


15 


UMC53 


S08 


2.01-20.2 


S08F 


CGGACGTGATGCAAGTTTCG 


G10851 










S08R 


AGCGGCTCAAGCTCTCCATC 


G10852 




UMC131 


S10 


2.04-2.05 


S10F2 


TCCTTGGCACTCACGCTACC 


G10816 










S10R2 


AGCATGGGGGGCAACAACTC 


G10817 




UMC49 


S12 


2.08-2.09 


S12F 


CAGAGAGCCGTCTCGAATCG 


G10845 


20 








S12R 


TTGATACTGCCGTCTGCCG 


G10846 




UMC102 


S14 


3 .04-3.05 


S14F 


TGCTGTGCTGTCACATGGCG 


G10801 










S14R 


CTGGGTCGTCGTGCTTTGAG 


G10802 




UMC63 


S16 


3.08-3.09 


S16F2 


ACGCCCTGACAGAACCATCG 


G10857 










S16R 


TTGCTCACTCGTGGTCGTGG 


G10857 


25 


Adh2 


S17 


4.03 


S17F2 


TGCCTGCTGCATCTCTAGCC 


X02915 










S17R2 


CAAGCCCGAAAATCGCCAC 


X02915 




UMC66 


S19 


4.06-4.07 


S19F 


TGGAGTGTCCAAAGACCGACC 


G10862 










S19R 


ACCAAAACGGGTGGTCTGCC 


G10863 




UMC90 


S22 


5.01 


S22F 


GCAGGTGAACAATGCTGCCC 


G10870 


30 








S22R 


CCAAAAGGCGGAGAACCGAC 


G10871 




Ael 


S23 


5.05 


S23F 


TCGCTGGGGTTTTAGCATTG 


L08065 










S23R 


CACTCGAACTCTGTTCAAGGCTTG 


L08065 




UMC59 


S26 


6.01-6.02 


S26F 


TCCAAAGCGAAAGCCTGATG 


G10853 










S26R 


TACGATGGCCGTGACCCTTC 


G10854 


35 


UMC65 


S27 


6.03-6.04 


S27F 


TTCCAGCTTTCCTCGGCACC 


G10860 










S27R 


AGCAGCAAGAGCAGAGCGTG 


G10861 




UMC21 


S28 


6.04-6.05 


S28F 


TGCAGATGTGCCTTTCCTGTG 


G10830 










S28R 


CAGTGGATTCGCTCCCTTCTC 


G10831 




UMC13 2 


S29 


6.06-6.07 


S29F 


CGCACAGAGGCAGATGCAGC 


G10824 


40 








S29R 


CGCTAGGCAGAGGTTCGAGC 


G10819 




UMC254 


S33 


7.03-7.04 


S33F 


C CGGGCGC AAAAGAATGTG 


G10832 










S33R 


AAGAAACCAGCACCAGCGGG 


G10833 




UMC80 


S34 


7.04 


S34F 


TCGCCTTTATCGGTGCAATG 


G10867 










S34R 


TGGAGCAAGCATGGAGATCG 


G10868 


45 


BNL9-11 


S3 8 


8.01-8.02 


S3 8F2 


CGAGGGAATGTCATCAACCC 


G10778 










S3 8R2 


ACCAAAGCTCCTCAGCCAAG 


G10779 




UMC10 9 


S42 


9.00-9.01 


S42F 


GCACCGTCGTTTACCTCAAGC 


G13177 










S42R 


TAGCCATCATCAGCGGCGTG 


G10807 




Wxl 


S43 


9.02-9.03 


S43F 


CGTGCTAC CTC AAGAGC AAC 


X03935 


50 








S43R 


ACTTCACGGCGATGTACTTG 


X03935 




UMC95 


S44 


9.04-9.05 


S44F 


CACTCGGAAGTCGGAATCGC 


G10872 










S44R 


ACCTTCGCAGTGTTGCGGAC 


G10872 




CSU61 


S45 


9.05-9.06 


S45F 


TCTCCACGAATCCCACCGTC 


T12691 










S45R 


AAGGGAGGGAATCCTCTACCG 


T12691 


55 


UMC13 0 


S48 


10.02-10.03 


S48F 


AAGGGGGAAGAAGGTCATC 


G10814 










S48R 


CG ATGGC AAC AACT AC C AGTAG 


G10815 




CSU109 


S53 


2.09 


S53F 


GC TTTCGGTTC CGGATAGCG 


T12721 










S53R 


ACTGGGCCATCTCCGACCAG 


T12721 


60 


UAZ77 


S56 


5.04 


S56F2 


GCAACCAACTGCAACATCGC 


T18762 








S56R2 


GAAGGAGCTC AAGGC CAAGG 


T18762 
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Shi 


S57 


9 .01 


S57F 


TGCTGTTATTGCGTGCCGTG 


X02382 








S57R2 


AAGGTGGCACCAAGGCGTTC 


X02382 


Sh2 


S63 


3 .09 


S63F 


TTCTTCACTGCACCCCGATG 


M81603 








S63R 


CTGCTCACTCTGCAATGCCC 


M81603 


Brel 


S65 


6 


S65F 


AGCAGCAGATCAGGCACACC 


U17B97 








S65R 


TTGAAGTTCGTTTCGGGCAC 


U17R97 


Btl 


S66 


5 


S66F 


o^au/iAtatjAiT-tjkjAvji i\»c re 


M793 33 








S66R 


TAGCGTGGAGGACGTTCTGG 


M79333 


Ssu 


S67 




S67F 


GCAAGCAAGCAAGCAGCGAG 


D00170 








S67R 


GACCCGAAGCAAAACCGAAC 


DO0 170 


Bt2 


S71 


4 


S71F 


TGC CG AAAAAGGTGGC ATTC 


Seq (Bae et al 












1990) 








S71R 


GCCCCCAATGTCGATTCAAC 








Example l.u 


PCR amolif ication 








PCR amplification 


was done with primer 


designed 



using the DNA sequences of the markers listed above. The 
sequences for all markers/genes were available on Genbank/ 
EMBL. 



Forward and reverse primers are given in the 
table A hereabove. 

PCR condition were as followed 

For each reaction in 30 microliters : DNA :60 
ng; Taq DNA polymerase (Amersham) : 0.9 unit; Buffer lOx ; 3 
microliter; dNTP's : 0.2 mM each; MgC12 : 1.5 mM; BSA 0.8mg/ 
ml; primers 1.5 ng /microliter each; glycerol 5%. 

Polymerisation was done in a perkin Elmer 9600 : 
1* at 95°C # followed by 35 cycles of (30" at 94°C, 30" at 
60°C, 1'30" at 72°C) followed by l'30""at 72°C. 

The sequencing of 186 maize amplicon was then 
done with the primers used for DNA allele amplif ication. DNA 
sequences were edited and aligned. Sequence surronding 
polymorphism (see table I here-under were collected from 
these alignments . 

LEGEND OF TABI^Ej I (with references to the Bt2 
gene for instance.) 

Column 1 (Bt2) represents the name of the marker 

or gene. 

Column 2 (Bt2-G2/G6-1) represents : 
- the name of the maker (Bt2) 
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II 

- the genotype number (G2) 

- the second genotype number (G6) 

- and the number of the SNP (single nucleotide 
polymorphism) . So, in this case, it is a SNP found on a 
sequence nucleotide Bt2 between the genotypes (strains of 
maize) G2 and G6 and this SNP was numbered 1 (Sometimes there 
are several SNP between two genotypes for the same sequence) 

Column 3 represents : similar to column 2, but 
with the codification of the marker/gene. 

Cplumn 4 represents sequence holding the SNP. 
Into brackets : [G/T] means that the sequence of G2 , at this 
position of Bt2 gene, is G, while for G6, it is T. 

On the other hand, /G (CSU61-G1/G5-1A) means 
deletion of the base pair G in Gl compared to G5 . 
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TABLE I 



DO 


ttrt^P*>JPI5 4 


S71G2/G6-1 


OSU 


<I*mi_ /"■■■ 1 V/"»C •« 
oSU- is l/V»9-l 


567G1/G5-1 


Ssu 




56/GVG3-1 






Cft7ft4 Afl'l-O 


Ssu 




oo roifUiK) 


Btl 
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ATAATACTTGATATGCCATT[G/T1TGTCCTCTTATTTTTAACAT 

ATGGCCTCGTCGGCCACTGCIAVCJGTCGCTCCGTTCCATGGGCT 

GCCGCTCCTCCAGAAGCCTC(G/A]GCAACGTCAGCAACGGCGGA 

G TGTTGCCCATCCCATCCCA(A/7]TTCCC AACCCCAAACG AACC 

GTACCTGCCGCCGCTGTCGA(CG/AC)GGACGACCTGCTGAAGCAGG 

AGTGAGCCCGCTTCTTATTCjrrjrAAGGTGATAGGTTTCTAAA 

AATG TAATGGT ACTCCG CG CfT/C JATG G CTC TGG TAC TT AGGAA 

AAATAGG CTCGGGCAATTATjCy] CAGCTTAGGGACAGCAAG CG 

TCC^CCCTGCCTCCX3GTTTT|A/T]GCCCGACCTTCGAAACATTC 

ACC^CTGACGTAGCAC^TCC[G/T)ACTTCTCGTTGTAAAACCCC 

GGAGGTTCGCCTCATGTTATlCyTlGTTGACGAGCCACATCCACT 

GCTCCGACTTO^TCTTGAJA^|CCTCCAC^CTGCCTCCGGTT 

CTGGTTGAAATGTGTTGAAG[C/AJTACTAGTGATGAACTGCTTG 

GCTGCTCCAAGCGAGCCCX>CfC/G]CCGAAAAAGGAAAAAGGTGA 

GCTGCT(XAAGC<^GCCCGC[CyGjcXGAAAAAGGAAAAAGTTGA 

CX5CCCCGAAAAAGGAAAAAG[Gy7]TGMGGTCCTTACTCACCGA 

CGCGCCGAAAAAGGAAAAAG[G/T]TGAAGGTCCTTACTCACCGA 

GAACCGGCCACAGTGCCTGAfT/AJTTTGGCGGTGAGACCTCTTC 

GAACCGGCCACAGTGCCTGAfT/AJTTTCGCCGTGAGACTTCTTC 

CAATTGTTACCTGAGCMGAIT/JTTTTGTGTACTTGACTTGTT 

TACrTGAGAGAATGCAACATqc/GlAGCATTCTGTGATTGGAGTC 

TTTTAGTGTACTTGACTTGT|C/T}CTCCTCCACAGATGAAATAT 

TTTTTGTGTACTTGACTTGTIC/T1CTCCTCCACAGATGAAATAT 

TCTGTGATTGGAGTCTGCTCfG/AJCGTGTCAGCTCTGGATGTGA 

MCTACAAAAAGCATCTCCT|G/1)GGATTTGGCTATCTCCTTTT 

TTAGCGCGAAAAAAAAACTCy I J I 1 11) 11 1 1 G TCCTTTTACT 

TCMTCCMTCAATTTMTTTr/QICTTC^ 

TTACTACXjAAAAAGTCTTGA(G/T]TCTAGGAATTTGAATTTGTG 

CTTCTTGGATTTTGCTATCTpVCK^ 

CTCCTTGGATTTTGCTATCTIT/CjCTTTTACTACGAAAAACTCT 

TTTTA CTACGAAAAGCATCTnVCJCTTGGATTTTGCTATCTTCT 

TTTTACTACGAAAAGCATCTlT/CJCTTGGATTiTGCTATCTCCT 

GAAGCCAMTCCTATTATTTTT/CJCTGCCTCTAGGGTCTGAATG 

GTACACTGTTAGAATCACACn"/GJTAGTGAAGCGCAACACAGAT 

G CXTTATCATCCTCTAGGTA[T/AJTGG AGACG AGTGACCAGTCT 

C^^TCTTGAGACCCGAGCC(C/T)CCAATCGCGCCCTTCTGTGC 

CTTTTCrrCV\GACCX^GC^OTlCCMTCGa3C^ 

GAGCCCCCMTCGCGCCCTT|CfT]TGTGCCTTGGCCT 

GAGCCTC<^MTCGCGC^CTTtC/TJT GTGCCTTGGCCTTGAGCTC 

GMGGAGCAGCAGCGC^GG(A/JACGTGTTCCAAGTCAACGTC 

GTAGAAAGTTAGCAAAAACA{T/ITTTTTTAGTGA^ 

ATTGTGGCTAGAAACTTTGGp iJI Mi l l 1 AAATTATGGTCAT 

CAGATCGGTTGTCCTCAGAGIA^^GTCACCTACCTGCAAACC 

MTTCTACATAGGAGTCATGICfTlACVVA^ 

ACAAGTACTTGTTTAAAGGA^C/JCATGCCGGAATAC^ 

GAGCX3AGATCGATCCTGTrTGn*A^)CATCCATCACT 

GAGCGAGATCGATCCTGTTGiT/C^T^ 

TAGTCATAGCAACACCATOqG/AJTO 

CMTTGAAGAGGAAAAAAAAjrTjnn^ACATA^ 

CAGAGACTCCACAAGGCGAA[AfC}GGAGTCCA^ 

CCCA*>3G0GGGAGATGGTGG{T/JTAGAAGCGGAA^ 

ACTTGmAAAGGACATGCCtGOGGA^ 

CXXAGGCCTTCOCAOGGGGG|VG]GATGG 

CAAAGCAGAGACTCCAC^AGIA/G^ 

GAACAGAGTCOGCMTAOTTITA^TCCTM 

GATTCAOAAACAGTGGCGGC(Art>)GA^ 

ATGAGTATATTCAAOTCATAn'A7nX37XW«rrAGAATGiTA^ 

CCTAGACGCTGACCGCCACA{G/AJACGG^ 

CCTAAACGCTOAjCXGCCACAJG/AJACOGC^ 

tgaacaaaccatgcgctacc^c/nag^ 

tcogog0macaacatccga(q/71tt^ 

gggaggggaaaaaama<^g/ajagc<;ttggt^ 

ggcggctgccaaat0cgcgg^a1aa>^^ 

ctagaatgttat7tcttcac(c/a)gttgaccatgg 

ctagmtgttatttcttcac^a)gttgaccatggaaa^ 

ncaccgttgaccatggaaa(a«3)aaa^ 

ttcacagttxiaccatggaaafvg^aaacagtaataagt^ 

ttctrcacagttgaccatgg^ajaamaaacagtaataagttc 

gaacccacxx3tgccctggga(a3}gg^^ 

GAACCCACCGTGCCCTGGGA(^)GGGAAAAAAa^ 

TGGGAGGGAAAAAAAAAGAAfG/AJAGOSTTGGTTW 

CGTACCAGCTAGGMTCGTA(A^JAAAAGCCTAGACGCTGACCG 

GCTGCGTCAATCATCACTTCn'/AJCCCACAGGCGTCAAGTA^ 

GACAGATTCCAAAGTAGTC/3[C/T)CGGCCAGGTC<5A^ 

GGCGCTGCX3TCAATC^TCAC(AJ77TCACCCA4^^ 
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TCG G TGTCACCACATGCAT A{T/G]TCAGG ACAG ATT CCAAAC T A 

TCGGTGTCACCAGATGCATA(T7G]TCAG G ACAG ATTCCAAAG TA 

GTCGCCGGCGAGGTCGAAAA[G/A]GAATACTCAGCAAAAGACCC 

GTCGTCGGCCAGGTCGAAAA|G/A)GAATACTCAGCAAAAGACCC 

TATTCAGGACAG ATTCCAAAfC/GJTAG TCG CCGGCCAG GTCG AA 

TAGTCAGGACAGATTCCAAAfC/GJTAGTCGCCGGCCAGGTCGAA 

GCGTCAAGTACAGATACGCA(A/G]CACGCCTCAGCTTCGCCTTG 

CCTGGGACTCCX5CAAATTGC(G/AJAGCACTCGGTGTCACCACAT 

GCTGGTTCATTATCTGACCT|G/T]GATTGCATTGCAGCTACAAG 

CTGGATTGCATTGCAGCTAC(A/G]AGAAGCCCGTGGAAGGCCGG 

GCTGGTTCATTATCTGACCTIG/T1GATTGCATTGCAGCTACX5AG 

CTTGATTGCATTGCAGCTAC(A/G)AGAAGCCCGTGGAAGG CCGG 

TCAGCCCCTACTACG CCG AA[G/] GAG CTCATCTCCGGCATCG C 

TACCCGGAGCTGMCCTCCCfCVGJGAGAGATTCAAGTCGTCCTT 

TGCATGTGAACATTCATGAA[T /CJGGTAACCCACAACTGTTCGC 

CTCCTACCAGGGCCGGTTCGrr/JCXJTTCTCCGACTACCCGGAG 

TGAATGGTAACCCACAACTG{C/T]TCGCGTCCT GCTGGTTCATT 

GCCGACAGGGTCCTCACCGTfG/C]AGCCCCTACTACGCCGAAGA 

GCTGGTTCATTATCTGACCT|G/T1GATTGCATTGCAGCTACAAG 

GCTGGTTCATTATCTGACCTIG/T}GATTGCATTGCAGCTACGAG 

CTGGATTGCATTGCAGCTAC{A/G)AGAAGCCCGTGGAAGGCCGG 

CTTGATTGCATTGCAGCTAC{A/G)AGAAGCCCGTGGAAGGCCGG 

TCAGCCCCTACTACGCCGAA[G/1GAGCTCATCTCCGGCATCGC 

TGCATGTGAACATTCATG^ATT/CJGGTAACCCACAACTGTTCGC 

CTGGTGGTGGTGCTTCTCTG[AAAC/JTGAAACTGAAACTGACTGCA 

GACCATCTTCACGTACT ACC{TACCVJAGACCG CTTTCTGCATCCAC 

CTGACCATCTTCACGTACTAiCCTA/JCCAGACCGCTTTCTGCATCC 

CTCCTACCAGGGCCGGTTCGnVICCTTCTCCGACTACCCGGAG 

GAGATTCAAGTCGTCCTTCG[G^TTTCATCGACGGGTCTGTT 

TGAATGQTAACCCACAACTG[C/T1TCGCGTCCTGCTGGTTCATT 

GCCGACAGGGTCXTCAC^TTG/CIAGCCCCTACTAC^ 

TCT GACCATCTT CACGTACTTACCT/^CCAGACCGCTTTCTGCATC 

CTTGATTGCATTGCAGCTAC(G/AJAGAAGCCCGTGGAAGGCCGG 

CTGGATTGCATTGCAGCTAC[G/A|AGAAGCCCGTGGAAGGCOGG 

G CTGGTTCATTATCTG ACCTTT/G] G ATTGCATTGCAGCTACGAG 

AGAGATTCAAGTCGTCCTTC{G/1GATTTCATCGACGGGTCTGT 

CTCCATGAAAAAGCTGCCGC^JTACTCTCTt^GTCAGCTACT 

CTGX^CTCCGATTGAGGGTC(CyGJGAAGCAGGGCAGCGCGTGTG 

CTGCACTCCGATTGAGGGTC[C/G]GAAGGAGGGCAGCGCGTTGT 

CTGCACTCGGATTGAGGGTC(CAS)GAAGCAGGGCAGCGCG TTTG 

CTGCACTCCGATTGAGGGTCtC/GlGMGCA GGGC AGCGCGTTT^ 

CATGCCTCTGTTGATATTTqGX^TGCACCTTTTGCT^ 

GATTTTGTAGGTTGATGCATIC/TIGTTTGATCTTTCT^ 

TGCTTGCAACTAMTTAATCiAASTTGCTCTAm 

ACATGTCCAGGACGCATGGTJC^CCCMTATTGTTGTTGGAAG 

TTGAICI I1CI lATCTCCTTVqCG^TTTGTTCTOTGTTATA 

TTGATCTTTCTTATCTCCTTVqCGAATTTGTO 

TXnAGGACTTGGAGAGCTTG{A^G7TMTrTACACATG^ 

CATGCCTCTGTTGATA 1 1 1 1 [U /UJUTGCACCTTTTGCTTGCAAC 

GATTTTGTAGGTTGATGCATIC/T}GTTTGA I C 1 1 1CTTATCTCC 

GAC^CATTrCCTACTCMTA(C/T]AATTATTTGAT 

AGT ATCACAGACTAATCTGA(A/GTT ATCTGGTTGCCACGAAAAC 

AGTATCACAGACTMTCTG>{A/Gn'ATCTGGTTGCCACAAAAAC 

TCAAAGTGGTGCMTCGCAATT^CICCACTTGGGCTTGCCGTGGT 

CCACTTGGTOTTGCCGTGGTIC^JCCTATCGTACGCAGGTAGCA 

AGCAI 1 1 1 1 1GI 1 1 IUI 1 1 UIA^CCTTGGCAGACAACAGACAG 

CAGTCCCGAG^TCCCAAATXC^CAGAAAAAGGTTTTG T^ 

CAGTrCCCGAGMTCCCAMTTC/)CAGAAA 

GGCAGACAACAGAGAGATCA[AG^CA)CATGCTTGCATTTACTCCCA 

GOCAGACAACAGACAGATCA(AG^CA}CATGCTTGCATT^ 

GTGATCACAGACTMTCTX3AiAA^ATCTGGTTGCCACGAAAAC 

GTGATCACAGACTAATCTOAjA/GTr ATCTGGTTGCCACAAAAAC 

TCTGAATATCTGGTTGCCAC{GVAJAAAACCG<XiACACAAGAGAG 

TCTGAGT ATCTGGTTGCCAC(GVAJAAAACCG<X3ACACAAGAGAG 

TCAGTCAAACTCAGTCCCGA^A^G]MTCCCAMTCAGAAAAAGG 

GGTTGC€ACGAAAACraGGA(C/GyU>^ 

GGTTG<XACGAAAACCGGGA{CA5JACAAGAGAGAAACTCAAAGT 

ACGCATGCTrGCATTTACTC(CyT)CAGTCAAACTCAGTOCCGAA 

ACACATOCTTGCATTTACTC(C/TK^GTCAAACTCAGTCCCGAA 

TATTATTCMTTTTGAATAA{tt)GAAGGAMTTTTAGCACCTC 

ATTAATAAATGCATCCTCTGtC/GfTAAAAAAACCCATTTrGAAT 

ATGMTTGMGCTCTGMTAiC/T)AGAATCCAOCATTCTTCCGA 

GAATCGAGCATTCrrrCC6AA(A/G)CTGCTTCCTACAAAACTOGA 

GAAAGGATGTG \ 1 1 1 1 GATA(QVA>CCTTCAQTCTTTCAGATGGA 

CAATGTCTTGTTCGTTATCA(A/G)CGAAAGTTrGAATCCCCACA 

TGTATCGGCTAGTCTGG ATG[G/AJT CGCACTGGCACTCAGTGCT 
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S27G3/G6-1A 


UMC65 


UMC65-G3/G6-1B 


S27G3/G6-1B 


UMC65 


UMC65-G3VG6-2 


S27G3/G&-2 


UMC65 


UMC65-G3/G&-3 


S27G3/G&-3 


UMC59 


UMC59-GVG&-1 


UMC59-G5K36-1 


UMC59 


UMC59-G5/G6-2 


UMC59-G5/G6-2 


UMC59 


UMC59-G5/G&-2B 


UMC53-G5/G6-2B 


UMC59 


UMC59-G5/G6-3 


UMC53-G5/Ge-3 


UMC59 


UMC59-G4/G5-1 


UMC59-G4/G5-1 


UMC59 


UMC59-G4A35-2 


UMC59-G4/G5-2 


UMC59 


UMC59-G3/G4-1 


UMC59-G3/G4-1 


UMC59 


UMC59-G3/G4-2 


UMC59-G3/G4-2 


UMC59 


UMC59-G3/G4-3 


UMC59-G3/G4-3 


Afl1 


Ae1-G4/G5-1 


S23G4/G5-1 


Afl1 


Ae1-G4/G5*2 


S23G4/G5-2 


Aa1 


Ae1-G37G6-1 


S23G3/G6-1 


Ae1 


Ae1-G5/G3-1 


S23G5/G3-1 


Ao1 


Ae1-G5/G3-1B 


S23G5/G3-1B 


Aa1 


A01-G1/G6-1 


S23G1/G6-1 


Aal 


Ae1-G1/G5-1 


S23G1/G5-1 


Aa1 


Ae1-G1/G4-1 


S23G1/G4-1 


UMC90 


UMC90-G5/G6-1 


S22G5/G6-1 


UMC90 


UMC90-GaG6-2 


S22G5/G6-2 


UMC90 


UMC9CW3SG6-3 


S22G5/G6-3 


UMC66 


UMC66-G5/G6-1 


S19G5/G6-1 


UMC66 


UMCG6-65/G6-2 


S19G5/G6-2 


Adh2 


Adh2-G4K5S-1 


S17G4/G6-1 


Adh2 


Adh2-G3/G6-1 


S17G3VG6-1 


UMC63 


UMC63-G4/G6-1 


S1GG4/G6-1 


UMC63 


UMC63-G2/G6-1 


S16G2/G6-1 


UMC63 


UMC63-G2/G6-2A 


S16G2/G6-2A 


UMC63 


UMC63-G2/G6-2B 


S1GG2&6-2B 


UMC63 


UMC63-G27G6-3A 


S16G2/G6-3A 


UMC63 


UMC63-G1X36-1 


S16G1/G6-1 


UMC63 


UMC63-G1/G3-2A 


S16G1/G3-2A 


UMC63 


UMC63-G1/G3-2B 


S1SG1/G3-2B 


UMC63 


UMC63-G1/G2-1 


S16G1/G2-1 


UMC102 


UMC102-G5/G6-1 


S14G5/G6-1 


UMC102 


UMC102-G&G6-1B 


S14GS/G6-1B 


ASG24 


ASG24-GSG6-1 


S13GVG6-1 


ASG24 


ASG24-G2/G6-1 


S13G2/GS-1 


UMC49 


UMC49-G4/G6-1 


S12G4G6-1 


UMC49 


UMC49-G27G5-1 


S12G2/G5-1 


UMC49 


UMC49-G2/G5-2 


S12G2/G5-2 


UMC49 


UMC49-G2/G5-3 


S12G2/GS-3 


UMC49 


UMC49-G2/G5-4 


S12G2/GW 


UMC49 


UMC49-G2/GS4B 


S12G2/G5-4B 


UMC49 


UMC4W32/G4-1 


S12G2/G4-1 


UMC49 


UMC49-G2/G4-2 


S12G2/G4-2 


UMC49 


UMC43-G2/G4-3 


S12G2/G4-3 


UMC49 


UMC49-G2/6^4 


S12G2/G3-4 


UMC49 


UMC49-G1/G6-1 


S12G1/G6-1 


UMC49 


UMC49-G1/G6-1B 


S12G1/G6-1B 


UMC49 


UMC49-G1/G5-1 


S12G1/G5-1 


UMC49 


UMC49-G1/G5-2 


S12G1/G5-2 


UMC49 


UMC49-G1/G5-3 


S12G1/GM 


UMC49 


UMC49-G1/G5-4 


S12G1/G5-4 


UMC131 


UMC131-G4/G6-1 


S10G4/G6-1 
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csnpld 

TCTATTGAGCAGTCTGAGAA(GCA/CT]AGGATGGTCGGCTTCTTCAG 

CCTTACACTATTAACAGGCC[C/T]GTGATCTACCTGAATGCCTG 

CAAGAAGCCTCTTCAGTGTC(A/C|GTCGTAGCTTCCTCAAGACC 

AGACCTTCCTGATGTGCGGA(T/C)GCTAATCCATGGAGCAGGGA 

AAG ACCTCCTGATGTGCGGAJT /C) GCTAATCCATGGAG CAG GGA 

CTAATCCATGGAGCAGGGAG {G/AJAAGGG GCGAGGGGCAG CAAG 

TCGTCGCGMTACAGCCGGG[G/CJGAGGGGGTGGTCGCGACTGG 

GTCGTAGCTTCCTCAAGACCnVJTCCTGATGTGCGGACGCTAA 

GAGTCGTCGCGAATACAGCC{A/G}GGGGAGGGGGTGGTCGCGAC 

AGGGGGTGGTCGCGACTGGAfT/GJCGCCCGAGCAGCGAGCAAGC 

MGCACATGTTTTAACCTTTnVGJATTCAAACTTTCCAGCCGTT 

AAGCACATG TTTTAACCTTTfT/G JATTCAAACTTTCCAGCGTTA 

GAATGTTGCTGTTATATTACfT /CJCGTAGGTGACAAAGGGTTCA 

AGAAAAATTTACATAAAAAA(G/C}CACACTCCATGATTGTTAAA 

AGAAAAATTTACATAAAAAAtG/CJCACACTCCATGATTGTTTAA 

CTTTTATTCAAACmCCAG[/qCGTTAATTTGTTATCCGTTG 

TGTTGMCATGCTCTCAGGA(/CqCCCCCTATTGTGACACAGCA 

TACATCTTAACAAGCACATt^G/TTTlTAACCTTTT ATTCAAACTTT 

AGTAATGTGTGACTGTGGGCfC/GJCGTGTGACAGCTTTTACGTA 

AGTAGTGTGTGACTGTGGGC(C/G]CGTGTGACAGCTTTTACGTA 

TTCGCTTGGTAGCCGTAGCA|G/AJTATACTTTTACCG GCCACAG 

GGGCTTTGGGTTGTGMCTT[CCA/C)AAAAAAAAAAAAAATTTCCC 

CCAAGAAAGATTAATGCTGG{/TJTAAMTATTGTTTCCAGTCT 

AAAATCAGGACTGCGAAAAAjA/CJCCAAGAAAGATTAATGCTGG 

AAAATCAGGACTGCGAAAAAIA/CJCCAAGAAGATTAATGCTGGT 

AAAGTGTGTGTTGTTGCCCA(G/A|ATGATTCCATTCCACACAAG 

AGGACTGCGAAAAAAGCAAGpAJAAGATTAATGCTGGTAAAAT 

ATGCTGGTAAAAT A I I GI 1 1 ^ qCAGTCTTTCACAAAGTGTGT 

CTACAAAAATCAGGACTGCG [/AJAAAAACCAAGAAGATTAATG 

TTGTTTCAGTCTTTCACAAA^GTIGTGTGTGTGCCAGATGATTC 

TCACACACCGACCTGCCTGG^/TJTATCAGGAACCATCCTCCTG 

GGTGAATTGGTGATGCATGC(T/G]GGGGGTGCTCGAGTTGGATG 

TTCCAGTCGGATGAACTGGAn"/GK*TTCG7CATCCACTCGTC^ 

GGTGMTTGGTGATGCATGCJAOIGGGGGTGCTCGAGTTGGATG 

TTMGTGAAGATGCCCAAAqC/GIGTTAAACTTTCCATGGAACT 

ATTAATGAAGATGCCCAAAC[C/G)GTTAAACTTTCCATGGAACT 

TGATTCGGGTCTGTATGCGAJG/TITGTTGTGGTGGTGAACTGGT 

CGGGTCT GTATGCGAGTGTT{G/AJTGGTGGTGAACTGGTGAATT 

GTTCGCGGTTTCTGGGGCCG(G/TJGGGCGGTGCTCGGTGGGGCC 

CAGATTGGTGTCGTTT ACTA(A/GJAATTCAGTTCTGTCCATTTG 

AAGTAAGCATTCTTTATATGI/Tn*ACTTCCCATGATAAACTTT 

CAAAGGGCTTACTGTACTTTpQCATCTTATTGGGACGGCACC 

ACTTGGCCGGGGACGTCGAC(G/AJATCGTCGTAGCACTACTGGT 

AGTACATGGCGAGCGTTGTA(G/C}CAGCTGCTTAGGTGATGTGG 

CTATTTCCAAGCTAACAACCfCVG}CT CTTGGTCCCAACATCCTG 

GGTTCTAAACATAGCTCGTCIC/AJATTCATGATTCATCTCGAGC 

TCAGCAAGCCTCCAAGGCTCfC/AJMTGGTOCAGTTACTTGGTT 

GTGTGTAGCTTCATTCGCAAn^ATrTTTGAACAGCCTCTGCAAGT 

GIGCI 1 1 CGTAAACCTAGAGpr^GACCAGCTGTGATTTCQGT 

GTGCTrTCGTAAACCTAG^CIlWOCAGCTG^ 

GCTGAGCAGCTGTGATTTOG(G/APt^ATTCCACGACCACGAGT 

TGTGTAGCTTCATTCGCAAA(GnTTT7GAACAGCCTCTGCAA^ 

GTGCTTCCGTAAACCTAGAGrT/CfrGACCAGCTGTGATTTCXiAT 

GTGCTTCCGTAAACCTAGAGCT/CTTGADCAGCTGTGATTTCGGT 

GTGTGTAGCTTCATTOGCAA(A/7)GTTTGAACAGCCTCTGCAAG 

GCTCAGCTGCCGGAGT ACGTIA/T]GGCTTGCTCTCCGGCCGGCC 

ATAGCTCTGCCGGAGTACGT[A/T)GGCTTGCTCTCCGGCCGGCC 

TTTCACAACTCAACTGATTGp/TJCTTGCTTTGAT 

TTGGTAATTTCAGAGCTAGA(C^GJAACTT ACTGTGGTACACGCC 

ACCTTTGCTGTG 1 1 1 1 1 1 1 1 [ \ /G)GTATTOGAATGGAGGGAGTA 

AAAACAGCCMGGTGGTGGTICA5JAAAGGAAGGTGTCAGAAGGT 

TCTGTTCGTTCCATCTCTTTlA/G)CAOTAAATATCCGTAATTAC 

CGTMTTACTTTGTTACTACfTA^CJAGTMTTTTATATATATCCT 

TATATATATCCTCATTTCAA(Arr}GAACAGTCAMGTTAGTTTT 

TATATATATCCTCATTTCAAIAniGAACAGTCAAAGTAGTTTTG 

TATTTCTTATCCAGGATTGTTT/qCTTTGGCCAAAGCATGGTAC 

CGTTCCATCTCTTTACAGT A(AX^ATATCCGTAATTACTTTGTT 

ATCCGTMTTACTTfGTTACPA/ACp TAAGT MTTTTATATATAT 

GTAATTACTTTGTTACTACTIA/JAOTAATTTTATATATATCCT 

CTGTG f 1 mill 1 GQTATTTG/C)GAATQQAGGGAGTATTATTT 

GCTGTG 1 1 Ml II I G QTATTyVC)QAATGQAGQQAGTATTATTT 

ACTT AGATGATGACCAGGTG(A/)AGAQTTTGGCACCTTTGCTQ 

AGTTTGGCACCTTTGCTGTGrT/J I 111 I 1 1 1 GGTATTGGAATG 

CTTTACT GATTGGGTTACAA(A/G1AGGTTATTTCTTATTCAGGC 

AATT ACTTTGTT ACTACCAG [T/JTAATTTTATATATATCCTCC 

AGCGACAGGGATGTCGAGCAJGnTCTACGGAAGGCAATAATGAG 
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UMC131 


UMC131-G4/G6-2 


S10G4/G6-2 


UMC131 


UMC131-G3/G6-1 


S10G3/G6-1 


UMC131 


UMC131-G3/G6-2A 


S10G3/GS-2A 


UMC131 


UMC131-G3/G6-2B 


S10G3/G6-2B 


UMC131 


UMC131-G1A36-1 


S10G1/G6-1 


UMC53 


UMC53-G5/G6-1 


UMC53435/G6-1 


UMC53 


UMC53-G5/G&-2 


UMC53-G5/G6-2 


UMC53 


UMC53-G4/G6-1 


UMC5^G4/G6-1 


UMC53 


UMC53-G4/G6-2 


UMC53-G4/G6-2 


UMCS3 


UMC53-G3/G6-1 


UMC53-G3/G6-1 


UMC53 


UMC53-G3/G5-1 


UMC53-G3/G5-1 


UMC53 


UMC53-G3/G5-2 


UMC53-G3K35-2 


UMC53 


UMC53-G3/G4-1 


UMC53-63/G4-1 


UMC53 


UMC53-G1/G4-1 


UMC53-G1/G4-1 


UMC161 


UMC161-G2/G3-1 


S0GG2/G3-1 


UMC161 


UMC161-G2/G3-2 


S06G2/G3-2 


UMC161 


UMC161-G2/G5-2B 


S06G2/G3-2B 


UMC107 


UMC107G2/G4-1 


S05G2/G4-1 


UMC67 


UMC67-G5/G6-1 


S03GSA36-1 


UMCG7 


UMC67-G2/G&-1 


S03G2A56-1 


UMC76 


UMC76-G4/G6-1 


S02G4/G6-1 


UMC76 


UMC76-G2/G&-1 


S02G2/G6-1 


UMC76 


UMC7W32/G&-1B 


S02G2/G6-1B 


UMC76 


UMC7W52K35-1 


S02G2/G5-1 


UMC76 


UMC7&G2/G5-1B 


S02G2/G5-1B 


UMC76 


UMC7eX32/G5-1 


SG2G2/G5-1 


UMC76 


UMC7W32K55-1B 


S02G2/G5-1B 


UMC76 


UMC76^2/G5-2 


S02G2/G5-2 


UMC76 


UMC7G-G2/G5-2B 


S02G2/GS-2B 


UMC76 


UMC76-G2/G5-3 


S02G2K5W 


UMC76 


UMC7&432/GWB 


SG2G2G&3B 


UMC76 


UMC7G-G2/G5-3C 


S02G2/G5-3C 


UMC7G 


UMC7W52/GWD 


S02G2/G5-3D 



csnpld 



AATTTGGGAAAATCAATGCA|GAATCAC)ATCAGTGATTAATCCACATA 

GCATGGCGGAGTGAGGGAGGfTGVJTGTGTGTGTGTGGCTCCACA 

GGCCGCTACG CCATTTAG C G [G/AJATTTG GG AAAATCAATGCAG 

GGCCGCTACGCGATTTAGCGfG/AJATTTGGGAAAATCAATGCAC 

GATCCCCGCCG GCAGAACAA(C/G]GTACGAGAAGGATGGAATGC 

GTCCCAGATCAGGTCCACGTTT/qCGAGCTCGCTGTTCCCGCTT 

TGGTTCTTCACCACCACCX5C[CX3]CCGGGCGCGCCCAGCGCCTC 

GCAGCCTCAGGTACACGGGG[/A|AAGTCGGAGTGGTTCTTCAC 

GCCGGGCGCGCCCAGCGCCTl/CJCGTCCCAGATCAGGTCCACG 

GCACGTCGTTGGTGAAGAAG[AC/CA]GCGGTACGGGTGCTTGTCGA 

AGGTACACGGGGAAGTCGGA{G/T]TGGTTCTTCACCACCACCGC 

CGACGG03TCCAGCACCGAC[GyjCCTCCGCCTTCACCCCGCGC 

GTCCACGTCGAGCTCGCTGT[C/T]CCCGCTGCCCACGACGGCGT 

GCACGTCGTTGGTGAAGAAG[A/CpVGCGGTACGGGTGCTTGTCG 

NAACCAAACCCTGACTATTAfT/CJAGGTAGATTAGACTAGACAC 

ACGGTGAGGAGTGGCACATG[A/CJGATGGAAAGTTCCTGTAGAC 

ACGGTAAGGAGTGGGACATG|A/C}GATGGAAAGTTCCTGTAGAC 

TATGCTTGGAAAGTGGG AAAJG/] GGG AACATACGATGGAGG AC 

AAACAATMTTTTTACACAG(/TJTGCTAAG G TTTT ACTG TTTT 

ATATCCATGTTGTCGCCTGC(/TG]TGTGCGCTTGCTTGCCGCTA 

TTGCTGCTATGTTTACTGGG[/TJTGTAGAAAAAAAAATAATAT 

GCTCGGTAATAATTCTGGCTICW5]CGATGGCACCCATATTCCTC 

GCTCG GTAATAATTCTGGCTIC/GJCGATGGCACCCATATTCCTG 

AAAACACGTGGTGTTTGTTAfG/AJGAAAGACCTAGTTTCTCGGC 

AAATCACGTGGTGTTTGTTA{G/AJGAAAGACCTAGTTTCTCGGC 

TAGTTTCTCGGCMTTGGCAiG/TJTGTGGAATGACCATCTCGTG 

TAGTTTCTCGGCAATTGGCA|GTI7TGTGGAATGACCATCTCGTC 

GTGTGGAATGACCATCTCGT[G/CJGTGATGCCAGCATGCTGTTA 

GTGTGGAATGACCATCTCGTEG/qGTGATGCCAGCATGCTACTA 

ACCCTGTCAGGCTTCCACAG{A/qTATAATATTTGTTGTGGTGT 

ACTCTGTCAGGCTTCCACAG [A/CJTATAATATTTGTTGTGGTGT 

ACTCTGTCAGGCTTCCACAGJA/CJTATAATATTTGTTGTGTGTG 

ACCCTGTCAGGCTTCCACAG[A/C]T ATAATATTTGTTGTGTGTG 
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Example 4 Analysis of Polymorphisms 

A. Preparation of Samples ^ 
Polymorphisms are detected in a target nucleic 

acid from a plant being analyzed. Target nucleic acids can be 
genomic or cDNA. Many of the methods described below require 
amplification of DNA from target samples. This can be 
accomplished by e.g., PCR. See generally PCR Technology: 
Principles and Applications [or DNA Amplification (ed. H.A. 
Erlich, Freeman Press, NY, NY, 1992); PCR Protocols: A Guide 
to Methods and Applications (eds. Innis, et al . , Academic 
Press, San Diego, CA, 1990); Mattila et al . , Nucleic Acids 
Res. 19, 4967 (1991); Eckert et al . , PCR Methods and 
Applications 1, 17 (1991); PCR (eds. McPherson et al . , • IRL 
Press, Oxford); and U.S. Patent 4,683,202 (each of which is 
incorporated by reference for all purposes) . 

Other suitable amplification methods include the 
ligase chain reaction (LCR) (see Wu and Wallace, Genomics 4, 
560 (1989), Landegren et al . , Science 241, 1077 (1988), 
transcription amplification (Kwoh et al., Proc. Natl. Acad. 
Sci. USA 86, 1173 (1989)), and self -sustained sequence 
replication (Guatelli et al., Proc. Nat. Acad. Sci. USA, 87, 
1874 (1990)) and nucleic acid based sequence amplification 
(NASBA) . The latter two amplification methods involve 
isothermal reactions based on isothermal transcription, which 
produce both single stranded RNA (ssRNA) and double stranded 
DNA (dSDNA) as the amplification products in a ratio of 
about 30 or 100 to 1, respectively. 

B. Detection of Polymorphisms in Target DNA 
There are two distinct types of analysis 

dependinq whether a polymorphism in question has already been 
characterized. The first type of analysis is sometimes 
referred to as de novo characterization. This analysis 
compares target sequences in different individual plants to 
identify points of variation, i.e., polymorphic sites.. The de 
novo identification of the polymorphisms of the invention is 
described in the Examples section, The second type of 



WO 98/30717 PCT/EP97/07134 

analysis is determining which form(s) of a characterized 
polymorphism is (are) present in plants under test. There are 
a variety of suitable procedures, which are discussed in 
turn. 

1 • Allele- Specific Probes 

The design and use of allele-specif ic probes for 
analyzing polymorphisms is described by e.g., Saiki et al., 
Mature 324, 163-166 (1986); Dattagupta, EP 235,726, Saiki, WO 
89/11548. Allele-specif ic probes can be designed that 
hybridize to a segment of target DNA from one member of a 
species but do not hybridize to the corresponding segment 
from another member due to the presence of different 
polymorphic forms in the respective segments from the two 
members. Hybridization conditions should be sufficiently 
stringent that there is a significant difference in 
hybridization intensity between alleles, and preferably an 
essentially binary response, whereby a probe hybridizes to 
only one of the alleles. Some probes are designed to 
hybridize to a segment of target DNA such that the 
polymorphic site aligns with a central position (e.g., in a 
15 mer at the 7 position; in a 16 mer, at either the 8 or 9 
position) of the probe. This design of probe achieves good 
discrimination in hybridization between different allelic 
forms . 

Allele-specif ic probes are often used in pairs, 
one member of a pair showing a perfect match to a reference 
form of a target sequence and the other member showing a 
perfect match to a variant form. Several pairs of probes can 
then be immobilized on the same support for simultaneous 
analysis of multiple polymorphisms within the same target 
sequence . 

2. Tiling Arrays 

The polymorphisms can also be identified by 
hybridization to nucleic acid arrays, some example of which 
are described by Wo 95/11995 (incorporated by reference in 
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its entirety for all purposes) . One form of such arrays is 
described in the Examples section in connection with de novo 
identification of polymorphisms. The same array or a 
different array can be used for analysis of characterized 
polymorphisms. WO 95/11995 also describes subarrays that are 
optimized for detection of a variant forms of a 
precharacterized polymorphism. Such a subarray contains 
probes designed to be complementary to a second reference 
sequence, which is an allelic variant of the first reference 
sequence. The second group of probes is designed by the same 
principles as described in the Examples except that the 
probe" exhibit complementarity to the second reference 
sequence. The inclusion of a second group (or further groups) 
can be particular useful for analysing short subsequences of 
the primary reference sequence in which multiple mutations 
are expected to occur within a short distance commensurate 
with the length of the probes (i.e., two or more mutations 
within 9 to 21 bases) . 

3 . Allele-Specif ic Primers 

An allele-specif ic primer hybridizes to a site 
on target DNA overlapping a polymorphism and only primes 
amplification of an allelic form to which the primer exhibits 
perfect complementarity. See Gibbs, Nucleic Acid Res . 17 , 
2427-2448 (1989) . This primer is used in conjunction with a 
second primer which hybridizes at a distal site. 
Amplif ication proceeds from the two primers leading to a 
detectable product signifying the particular allelic form is 
present. A control is usually performed with a second pair of 
primers, one of which shows a single base mismatch at the 
polymorphic site and the other of which exhibits perfect 
complementarity to a distal site. The single-base mismatch 
prevents amplification and no detectable product is formed. 
The method works best when the mismatch is included in the 
3 '-most position of the oligonucleotide aligned with the 
polymorphism because this position is most destabilizing to 
elongation from the primer. See, e.g., WO 93/22456. 
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4 . Direct-Seauencina 

The direct analysis of the sequence of 
polymorphisms of the present invention can bo accomplished 
using either the dideoxy chain termination method or the 
Maxam Gilbert method (see Sambrook et al., Molecular Cloning, 
A Laboratory Manual (2nd Ed., CSHP, New York 1989); Zyskind 
et al . / Recombinant DNA Laboratory Manual, (Acad. Press, 
1988) ) . 



5 . Denaturing Gradient Gel Electrophoresis 
Amplification products generated using the 

polymerase chain reaction can be analyzed by the use of 
denaturing gradient gel electrophoresis. Different alleles 
15 can be identified based on the different sequence-dependent 

melting properties and electrophoretic migration of DNA in 
solution, Erlich, ed., PCR Technology, Principles and 
Applications for DNA Amplification, (W. H. Freeman and Co, 
New York, 1992), Chapter 7. 

20 

6 . Single- Strand Conformation Polymorphism 

Analysis 

Alleles of target sequences can be 
dif f erantiated using single-strand conformation polymorphism 

25 analysis, which identifies base differences by alteration in 

electrophoretic migration of single stranded PCR products, as 
described in Orita et al . , Proc, Nat. Acad. Sci . 86, 
2766-2770 (1989). Amplified PCR products can be generated as 
described above, and heated or otherwise denatured, to form 

30 single stranded amplification products. Single-stranded 

nucleic acids may refold or form secondary structures which 
are partially dependent on the base sequence. The different 
electrophoretic mobilities of single- stranded amplification 
products can be related to base-sequence difference between 

35 alleles of target sequences. 



Example 5 . Methods of Use 
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After determining polymorphic form(s) present in 
• a subject plant at one or more polymorphic sites, this 
information can be used in a number of methods . 

A. Fingerprint Analysis 

Analysis of which polymorphisms are present in a 
plant is useful in determining of which strain the plant is a 
member and in distinguishing one strain from another. A 
genetic fingerprint for an individual strain can be made by 
determining the nucleic acid sequence possessed by that 
individual strain that corresponds to a region of the genome 
known to contain polymorphisms* For a discussion of genetic 
fingerprinting in the animal kingdom, see, for example, 
Stokening et.al., Am. J. Hum. Genet. 48:370-382 (1991). The 
probability that one or more polymorphisms in an individual 
strain is the same as that in any other individual strain 
decreases as the number of polymorphic sites is increased. 

The comparison of the nucleic acid sequences 
from two strains at one or multiple polymorphic sites can 
also demonstrate common or disparate ancestry. Since the 
polymorphic sites are within a large region in the genome, 
the probability of recombination between these polymorphic 
sites is low. That low probability means the haplotype (the 
set of all the disclosed polymorphic sites) set forth in this 
application should be inherited without change for at least 
several generations. Knowledge of plant strain or ancestry is 
useful, for example, in a plant breeding program or in 
tracing progeny of a proprietary plant. Fingerprints are also 
used to identify an individual strain and to distinguish or 
determine the relatedness of one individual strain to 
another. Genetic fingerprinting can also be useful in hybrid 
certification, the certification of seed lots, and the 
assertion of plant breeders rights under the laws of various 
countries . 



B. Correlation of Polymorphisms with Phenotvpic 
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The polymorphisms of the invention may 
contribute to the phenotype of a plant in different ways. 
Some polymorphisms occur within a protein coding sequence and 
contribute to phenotype by affecting protein structure. The 
effect may be neutral, beneficial or detrimental, or both 
beneficial and detrimental, depending on the circumstances. 
Other polymorphisms occur in noncoding regions but may exert 
phenotypic effects indirectly via influence on replication, 
transcription, and translation. A single polymorphism may 
affect more than one phenotypic trait. Likewise, a single 
phenotypic trait may be affected by polymorphisms in 
different genes. Further, some polymorphisms predispose a 
plant to a distinct mutation that is causally related to a 
certain phenotype. 

Phenotypic traits include characteristics such 
as growth rate, crop yield, crop quality, resistance to 
pathogens, herbicides, and other toxins, nutrient 
requirements, resistance to high temperature, freezing, 
drought, requirements for light and soil type, aesthetics, 
and height. Other phenotypic traits include susceptibility or 
resistance to diseases, such as plant cancers. Often 
polymorphisms occurring within the same gene correlate with 
the same phenotype. 

Correlation is performed for a population of 
plants, which have been tested for the presence or absence of 
a phenotypic trait of interest and for polymorphic markers 
sets. To perform such analysis, the presence or absence of a 
set of polymorphisms (i.e. a polymorphic set) is determined 
for a set of the plants, some of whom exhibit a particular 
trait, and some of which exhibit lack of the trait. The 
alleles of each polymorphism of the set are then reviewed to 
determine whether the presence or absence of a particular 
allele is associated with the trait of interest. Correlation 
can be performed by standard statistical methods such as a 
K-squared test and statistically significant correlations 
between polymorphic form(s) and phenotypic characteristics 
are noted. 
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Correlations between characteristics and 
phenotype are useful for breeding for desired 
characteristics. By analogy, Beitz et al . , US 5,292,639 
discuss use of bovine mitochondrial polymorphisms in a 
breeding program to improve milk production in cows. To 
evaluate the effect of mtDNA D-loop sequence polymorphism on 
milk production, each cow was assigned a value of 1 if 
variant or O if wildtype with respect to a prototypical 
mitochondrial DNA sequence at each of 17 locations 
considered. Each production trait was analyzed individually 
with the following animal model : 

Yijkpn = \i + YSi + Pj; + X k ~ Bi + ... 617 + PE T1 + a n +e p 
where Yijkpn is the milk, fat, fat percentage, SNF, SNF 
percentage, energy concentration, or lactation energy record; 
\i is an overall mean; YSi is the effect common to all cows 
calving in year-season; Xk is the effect common to cows in 
either the high or average selection line; Bi to 617 are the 
binomial regressions of production record on mtDNA D-loop 
sequence polymorphisms; PE n is permanent environmental effect 
common to all records of cow n; a n is effect of animal n and 
is composed of the additive genetic contribution of sire and 
dam breeding values and a Mendelian sampling effect; and e p 
is a random residual. It was found that eleven of seventeen 
polymorphisms tested influenced at least one production 
trait. Bovines having the best polymorphic forms for milk 
production at these eleven loci are used as parents for 
breeding the next generation of the herd. 

One can test at least several hundreds of 
markers simultaneously in order to identify those linked £0 a 
gene or chromosomal region. For example, to identify markers 
linked to a gene conferring disease resistance, a DNA pool is 
constructed from plants of a segregating population that are 
resistant and another pool is constructed from plants that 
are sensitive to the disease. Those two DNA pools are 
identical except for the DNA sequences at the resistance gene 
locus and in the surrounding genomic area. Hybridization of 
such DNA pools to the DNA sequences listed in Table 1 allows 
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the simultaneous testing of several hundreds of loci for 
polymorphisms. Allelic polymorphism-detecting sequences that 
show differences in hybridization patterns between such DNA 
pools will represent loci linked to the disease resistance 
gene. 

The method just described can also be applied to 
rapidly identify rare alleles in large populations of plants . 
For example, nucleic acid pools are constructed from several 
individuals of a large population. The nucleic acid pools are 
hybridized to nucleic acids having the polymorphism-detecting 
sequences listed in Table I. The detection of a rare 
hybridization profile will indicate the presence of a rare 
allele in a specific nucleic acid pool. RNA pools are 
particularly suited to identify differences in gene 
expression. 

C. Marker assisted back-cross 

The markers are used to select, in back-cross 
populations, the plant that have the higher percentage of 
recurrent parent, while still remaining the genes given by 
the donor plant . 



Example 6. Modified Polypeptides and Gene 

Sequence? 

The invention further provides variant forms of 
nucleic acids and corresponding proteins. The nucleic acids 
comprise at least 10 contiguous amino acids of one of the 
sequences for example as described in Table I, in any of the 
allelic forms shown. Some nucleic acid encode full-length 
proteins . 

Genes can be expressed in an expression vector 
in which a gene is operably linked to a native or other 
promoter. Usually, the promoter is an eukaryotic promoter for 
expression in a eukaryotic cell. The transcription regulation 
sequences typically include an heterologous promoter and 
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optionally an enhancer which is recognized by the host. The 
selection of an appropriate promoter, for example trp, lac, 
phage promoters, glycolytic enzyme promoters and tRNA 
promoters, depends on the host selected. Commercially 
available expression vectors can be used. Vectors can include 
host-recognized replication systems, amplifiable genes, 
selectable markers, host sequences useful for insertion into 
the host genome, and the like. 

The means of introducing the expression 
construct into a host cell varies depending upon the 
particular construction and the target host. Suitable means 
include fusion, conjugation, transf ection, transduction, 
electroporation or injection, as described in Sambrook, 
supra. A wide variety of host cells can be employed for 
expression of the variant gene, both prokaryotic and 
eukaryotic . Suitable host cells include bacteria such as E. 
coli, yeast, filamentous fungi, insect cells, mammalian 
cells, typically immortalized, e.g., mouse, CHO, human and 
monkey cell lines and derivatives thereof, and plant cells. 
Preferred host cells are able to process the variant gene 
product to produce an appropriate mature polypeptide. 
Processing includes glycosylation, ubiquitination, disulfide 
bond formation, general post-translational modification, and 
the like . 

The DNA fragments are introduced into cultured 
plant cells by standard methods including electroporation 
(From et al . , Proc. Natl Acad. Sci, USA 82, 5824 (19853, 
infection by viral vectors such as cauliflower mosaic virus 
(CaMV) (Hohn et al . , Molecular Biology of Plant Tumors, 
(Academic Press, New York, 1982) pp. 549-560; Howell, US 
4,407,956), high velocity ballistic penetration by small 
particles with the nucleic acid either within the matrix of 
small beads or particles, or on the surface (Klein et al . , 
Nature 327, 70-73 (1987) ), USQ of pollen as vector (WO 
85/01856), or use of Agrobacterium tumefaciens transformed 
with a Ti plasmid in which DNA fragments are cloned. The Ti 
plasmid is transmitted to plant cells upon infection by 
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Agrobacterium tumefaciens, and is stably integrated into the 
plant genome (Horsch et al . , Science, 233, 496-498 (1984); 
Fraley et al . , Proc. Natl. Acad. Sci. USA 80, 4803 (1983)). 

The protein may be isolated by conventional 
means of protein biochemistry and purification to obtain a 
substantially pure product, i.e., 80, 95 or 99% free of cell 
component contaminants, as described in Jacoby, Methods in 
Enzymology Volume 104, Academic Press, New York (1984); 
Sc ropes, Protein Purification, Principles and Practice ' , 2nd 
Edition, Springer-Verlag, New York (1987); and Deutscher 
(ed) , Guide to Protein Purification' Methods in Enzymology, 
Vol. 182 (1990). If the protein is secreted, it can be 
isolated from the supernatant in which the host cell is 
grown. If not secreted, the protein can be isolated from a 
lysate of the host cells. 

The invention further provides transgenic plants 
capable of expressing an exogenous variant gene and/or having 
one or both alleles of an endogenous variant gene 
inactivated. Plant regeneration from cultural protoplasts is 
described in Evans et al . , "Protoplasts Isolation and 
Culture," Handbook of Plant Cell Cultures 1 , 124-176 
(MacMillan Publishing Co., New York, 1983); Davey, "Recent 
Developments in the Culture and Regeneration of Plant 
Protoplasts," Protoplasts, (1983) -pp. 12-29, (Birkhauser, 
Basal 1983); Dale, "Protoplast Culture and Plant Regeneration 
of Cereals and Other Recalcitrant Crops," Protoplasts (1983) 
-pp. 31-41, (Birkhauser, Basel 1983); Binding, "Regeneration 
of Plants," Plant ProtopLasts, pp. 21-13, (CRC Press, Boca 
Raton, 1985) . For example, a variant gene responsible for a 
disease-resistant phenotype can be introduced into the plant 
to simulate that phenotype. Expression of an exogenous 
variant gene is usually achieved by operably linking the qene 
to a promoter and optionally an enhancer. Inactivation of an 
exogenous variant genes can be achieved by forming a 
transgene in which a cloned variant genes is inactivated by 
insertion of a positive selection marker. See Capecchi, 
Science 244, 1288-1292 (1989) . Such transgenic plant are 
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useful in a variety of screening assays. For example, the 
transgenic plant can then be treated with compounds of 
interest and the effect of those compounds on the disease 
resistance can be monitored. In another example, the 
transgenic plant can be exposed to a variety of environmental 
conditions to determine the effect of those conditions on the 
resistance to the disease. 

In addition to substantially full-length 
polypeptides, the present invention includes biologically 
active fragments of the polypeptides, or analogs thereof, 
including organic molecules which simulate the interactions 
of the peptides. Biologically active fragments include any 
portion of the full-length polypeptide which confers a 
biological function on the variant gene product, including 
ligand binding, and antibody binding. Ligand binding includes 
binding by nucleic acids, proteins or polypeptides, small 
biologically active molecules, or large cellular structures. 

Polyclonal and/or monoclonal antibodies that 
specifically bind to one allelic gene products but not to a 
second allelic gene product are also provided. Antibodies can 
be made by injecting mice or other animals with the variant 
gene product or synthetic peptide fragments thereof. 
Monoclonal antibodies are screened as are described, for 
example, in Harlow & Lane, Antibodies, A Laboratory Manual, 
Cold Spring Harbor Press, New York (1988); Goding, Monoclonal 
antibodies, Principles and Practice (2d ed. ) Academic Press, 
New York (1986). Monoclonal antibodies are tested for 
specific immunoreactivity with a variant gene product and 
lack of immunoreactivity to the corresponding prototypical 
gene product. These antibodies are useful in diagnostic 
assays for detection of the variant form, or as an active 
ingredient in a pharmaceutical composition. 

Examnle 7 . Kits 

The invention further provides kits comprising 
at least one allele-specif ic oligonucleotide as described 
above. Often, the kits contain one or more pairs of 
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allele-specif ic oligonucleotides hybridizing to different 
forms of a polymorphism. In some kits, the allele-specif ic 
oligonucleotides are provided immobilized to a substrate. For 
example, the same substrate can comprise allele-specif ic 
oligonucleotide probes for detecting at least 10, 100 or all 
of the polymorphisms shown in Table I. Optional additional 
components of the kit include, for example, restriction 
enzymes, reverse- transcriptase or polymerase, the substrate 
nucleoside triphosphates, means used to label (for example, 
an avidin-enzyme conjugate and enzyme substrate and chromogen 
if the label is biotin) , and the appropriate-buffers for 
reverse transcription, PCR, or hybridization reactions. 
Usually, the kit also contains instructions for carrying out 
the methods. 
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CLAIMS 
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1. A nucleic acid segment comprising at least 10 
contiguous nucleotides from a vegetal sequence including a 
polymorphic site, notably a Single Nucleotide Polymorphism 
(SNP) or the complement of the segment. 

2. A nucleic acid segment of claim 1, which is 
comprised in the sequence shown in Table I. 

3. A nucleic acid segment of claim 1, less than 



100 bases. 



DNA. 



RNA 



bases . 



bases . 



4. A nucleic acid segment of claim 1, that is 

5. A nucleic acid segment of claim 1, that is 

6. The segment of claim 1 that is less than 5 0 

7. The segment of claim 1, that is less than 2 0 



8. An allele-specif ic oligonucleotide that 
hybridizes to a sequence of claim 1 or its complement. 

9. An allele-specif ic oligonucleotide that 
hybridizes to a sequence of claim 8, sequence shown in Table 
1. 

10. The allele-specif ic oligonucleotide of claim 
8, that is a probe 

11. The allele-specif ic oligonucleotide of claim 
10, wherein the central position of the probe aligns with the 
polymorphic site in the sequence. 

12 . The allele-specif ic oligonucleotide of claim 
8, that is a primer. 

13 . The allele-specif ic oligonucleotide of claim 
12, primer which comprises a sequence shown in Table I 

14 . The allele-specif ic oligonucleotide of claim 
12, 3' end primer which comprises a sequence shown in Table 
I. 

15. The method of analysing a nucleic acid, 
comprising : obtaining the nucleic acid from a subject; and 
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determining a base occupying any one of the polymorphic sites 
shown in Table I . 

16. The method of claim 15, wherein the 
determining comprises determining a set of bases occupying a 
set of the polymorphic sites shown in Table I. 

17. The method of claim 16, wherein the nucleic 
acid is obtained from a plurality of subjects, and a base 
occupying one of the polymorphic positions is determined in 
each of the subjects, and the method further comprises 
testing each subject for the presence of a phenotype, and 
correlating the presence of the phenotype with the base. 

18. Kit comprising at least one allele-specif ic 
oligonucleotide of claim 1 and optional additional composants 
(enzymes, buffers, instructions...) 

19. Kit according to claim 18 comprising at 
least one allele-specif ic oligonucleotide of claim 2. 

2 0 Use of the nucleic segments according to 
claims 1 to 17, to demonstrate common or disparate ancestry. 

21. Use of the nucleic segments according to 
claims 1 to 17 in plant breeding. 

22 . Use of the nucleic acid segments according 
to claims 1 to 17 to trace progeny of a priority plant. 

23 . Use of the nucleic acid segments according 
to claims 1 to 17 in hybrid certification. 

24 . Use of the nucleic acid segments according 
to claims 1 to 17 to select in a back-cross population the 
plants that have the higher percentage of recurrent parent 
(marker assisted back-cross) . 

25 . Use of the nucleic segments according to 
claim 1 to 17, wherein the polymorphisms, all of them or most 
of them, are linked to a group of genes involved in a given 
metabolic pathway. 

26. Use according to 25, wherein the metabolic 
pathway is selected from the oil metabolic pathway, the 
starch metabolic pathway, the protein metabolic pathway, the 
aminoacids metabolic pathway, the lignin and the cell wall 
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composition metabolic pathway and the pathogehe resistance 
pathway 
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2. Claims Nos.: 

— because they relate to parts of the International Application that do not comply with the prescribed requirements to such 
an extent that no meaningful International Search can be carried out, specifically: 



3. fj Claims Nos.: 

because they are dependent claims and are not drafted in accordance with the second and third sentences of Rule 6.4{a). 



Box II Observations where unity of invention Is lacking (Continuation of item 2 of first sheet) 



This International Searching Authority found multiple inventions in this international appfication, as follows: 

see FURTHER INFORMATION sheet 



1 . I - ] As atl required additional search fees were timely paid by the applicant, this International Search Report covers ait 
' ' searchable claims. 

2. [^] As all searchable daims could be searched without effort justifying an additional fee, this Authority did not invite payment 

of any additional fee. 



□ As only some of the required additional search fees were timely paid by the applicant, this International Search Report 
covers only those claims for which fees were paid, specifically claims Nos.: 



4. fx"] No required additional search fees were timely paid by the applicant. Consequently, this International Search Report is 
restricted to the invention first mentioned in the claims; it is covered by claims Nos.: 

see FURTHER INFORMATION sheet, subject 1. 



Remark on Protest | | The additional search fees were accompanied by the applicant's protest. 

| [ No protest accompanied the payment of additional search fees. 



Form PCT/ISA/21 0 (continuation of first sheet (1 )) (July 1 998) 



International Application No. PCT/ EP 97/07134 



FURTHER INFORMATION CONTINUED FROM PCT/ISA/ 210 



This International Searching Authority found multiple (groups of) 
inventions in this international application, as follows: 

1. Claims: 1-26 (partial) 

INVENTION 1: . . 

A nucleic acid segment from a vegetal sequence including a 
polymorphic site, and in particular SEQ ID NOs: 67 and 68 
(Bt2 gene/marker from maize), an allele-specific 
oligonucleotide hybridizing to such sequences or their 
complements, a method of analyzing them, a kit containing 
such allele-specific oligonucleotides, and the use of such 
sequences. 



2. Claims: 1-26 (partial) 

INVENTION 2: . . 

SEQ ID NOs: 69 to 76 (Ssu gene/marker from maize), an 
allele-specific oligonucleotide hybridizing to these 
sequences or their complements, a method of analyzing them, 
a kit containing such allele-specific oligonucleotides, and 
the use of such sequences. 



3. Claims: 1-26 (partial) 

INVENTION 3: . . 

SEQ ID NOs: 77 to 82 (Btl gene/marker from maize), an 
allele-specific oligonucleotide hybridizing to these 
sequences or their complements, a method of analyzing them, 
a kit containing such allele-specific oligonucleotides, and 
the use of such sequences. 



4. Claims: 1-26 (partial) 
INVENTION 4: 

. SEQ ID NOs: 83 to 90 (Brel gene/marker from maize), an 
allele-specific oligonucleotide hybridizing to these 
sequences or their complements, a method of analyzing them, 
a kit containing such allele-specific oligonucleotides, and 
the use of such sequences. 



5. Claims: 1-26 (partial) 
INVENTION 5: 

SEQ ID NOs: 91 to 104 (ASG12 gene/marker from maize), an 
allele-specific oligonucleotide hybridizing to these 
sequences or their complements, a method of analyzing them, 
a kit containing such allele-specific oligonucleotides, and 
the use of such sequences. 
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6. Claims: 1-26 (partial) 
INVENTION 6: 

SEQ ID NOs: 105 to 114 (Sh2 gene/marker from maize), an 
allele-specific oligonucleotide hybridizing to these 
sequences or their complements, a method of analyzing them, 
a kit containing such allele-specific oligonucleotides, and 
the use of such sequences. 



7. Claims: 1-26 (partial) 
INVENTION 7: 

SEQ ID NOs: 115 to 132 (Shi gene/marker from maize), an 
allele-specific oligonucleotide hybridizing to these 
sequences or their complements, a method of analyzing them, 
a kit containing such allele-specific oligonucleotides, and 
the use of such sequences. 



8. Claims: 1-26 (partial) 
INVENTION 8: 

SEQ ID NOs: 133 to 144 (UAZ77 gene/marker from maize), an 
allele-specific oligonucleotide hybridizing to these 
sequences or their complements, a method of analyzing them, 
a kit containing such allele-specific oligonucleotides, and 
the use of such sequences. 



9. Claims: 1-26 (partial) 
INVENTION 9: 

SEQ ID NOs: 145 and 146 (UAZ171 gene/marker from maize), an 

allele-specific oligonucleotide hybridizing to these 

sequences or their complements, a method of analyzing them, 
a kit containing such allele-specific oligonucleotides, and 
the use of such sequences. 



10. Claims: 1-26 (partial) 
INVENTION 10: 

SEQ ID NOs: 147 to 150 (UMC17 gene/marker from maize), an 
allele-specific oligonucleotide hybridizing to these 
sequences or their complements, a method of analyzing them, 
a kit containing such allele-specific oligonucleotides, and 
the use of such sequences. 



11. Claims: 1-26 (partial) 
INVENTION 11: 

SEQ ID NOs: 151 to 178 (CSU109 gene/marker from maize), an 
allele-specific oligonucleotide hybridizing to these 
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sequences or their complements, a method of analyzing them, 
a kit containing such allele-specific oligonucleotides, and 
the use of such sequences. 

12. Claims: 1-26 (partial) 

INVENTION 12: . , 

SEQ ID NOs: 179 to 180 (UMC13G gene/marker from maize), an 
allele-specific oligonucleotide hybridizing to these 
sequences or their complements, a method of analyzing them, 
a kit containing such allele-specific oligonucleotides, and 
the use of such sequences* 

13. Claims: 1-26 (partial) 

INVENTION 13: . , 

SEQ ID NOs: 181 to 212 (CSU61 gene/marker from maize), an 
allele-specific oligonucleotide hybridizing to these 
sequences or their complements, a method of analyzing them, 
a kit containing such allele-specific oligonucleotides, and 
the use of such sequences. 

14. Claims: 1-26 (partial) 

INVENTION 14: ... 
SEQ ID NOs: 213 to 234 (UMC95 gene/marker from maize), an 
allele-specific oligonucleotide hybridizing to these 
sequences or their complements, a method of analyzing them, 
a kit containing such allele-specific oligonucleotides, and 
the use of such sequences. 

15. Claims: 1-26 (partial) 

INVENTION 15: . , 

SEQ ID NOs: 235 to 290 (Wxl gene/marker from maize), an 
allele-specific oligonucleotide hybridizing to these 
sequences or their complements, a method of analyzing them, 
a kit containing such allele-specific oligonucleotides, and 
the use of such sequences. 



16. Claims: 1-26 (partial) 

INVENTION 16: . , 

SEQ ID NOs: 291 to 300 (UMC109 gene/marker from maize), an 
allele-specific oligonucleotide hybridizing to these 
sequences or their complements, a method of analyzing them, 
a kit containing such allele-specific oligonucleotides, and 
the use of such sequences. 
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17. Claims: 1-26 (partial) 
INVENTION 17: 

SEQ ID NOs: 301 to 320 (UMC80 gene/marker from maize), an 
allele-specific oligonucleotide hybridizing to these 
sequences or their complements, a method of analyzing them, 
a kit containing such allele-specific oligonucleotides, and 
the use of such sequences. 



18. Claims: 1-26 (partial) 
INVENTION 18: 

SEQ ID NOs: 321 to 358 (UMC254 gene/marker from maize), an 
allele-specific oligonucleotide hybridizing to these 
sequences or their complements, a method of analyzing them, 
a kit containing such allele-specific oligonucleotides, and 
the use of such sequences. 



19. Claims: 1-26 (partial) 
INVENTION 19: 

SEQ ID NOs: 359 to 366 (ASG49 gene/marker from maize), an 
allele-specific oligonucleotide hybridizing to these 
sequences or their complements, a method of analyzing them, 
a kit containing such allele-specific oligonucleotides, and 
the use of such sequences. 



20. Claims: 1-26 (partial) 
INVENTION 20: 

SEQ ID NOs: 367 to 370 (ASG8 gene/marker from maize), an 
allele-specific oligonucleotide hybridizing to these 
sequences or their complements, a method of analyzing them, 
a kit containing such allele-specific oligonucleotides, and 
the use of such sequences. 



21. Claims: 1-26 (partial) 
INVENTION 21: 

SEQ ID NOs: 371 to 374 (UMC132 gene/marker from maize), an 
allele-specific oligonucleotide hybridizing to these 
sequences or their complements, a method of analyzing them, 
a kit containing such allele-specific oligonucleotides, and 
the use of such sequences. 



22. Claims: 1-26 (partial) 
INVENTION 22: 

SEQ ID NOs: 375 to 406 (UMC21 gene/marker from maize), an 
allele-specific oligonucleotide hybridizing to these 
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sequences or their complements, a method of analyzing them, 
a kit containing such al lele-specif ic oligonucleotides, and 
the use of such sequences. 



23. Claims: 1-26 (partial) 
INVENTION 23: 

SEQ 10 NOs: 407 to 416 (UMC65 gene/marker from maize), an 
allele-specific oligonucleotide hybridizing to these 
sequences or their complements, a method of analyzing them, 
a kit containing such allele-specific oligonucleotides, and 
the use of such sequences. 



24. Claims: 1-26 (partial) 
INVENTION 24: 

SEQ ID NOs: 417 to 434 (UMC59 gene/marker from maize), an 
allele-specific oligonucleotide hybridizing to these 
sequences or their complements, a method of analyzing them, 
a kit containing such allele-specific oligonucleotides, and 
the use of such sequences. 



25. Claims: 1-26 (partial) 
INVENTION 25: 

SEQ ID NOs: 435 to 450 (Acl gene/marker from maize), an 
allele-specific oligonucleotide hybridizing to these 
sequences or their complements, a method of analyzing them, 
a kit containing such allele-specific oligonucleotides, and 
the use of such sequences. 



26. Claims: 1-26 (partial) 
INVENTION 26: 

SEQ ID NOs: 451 to 456 (UMC90 gene/marker from maize), an 
allele-specific oligonucleotide hybridizing to these 
sequences or their complements, a method of analyzing them, 
a kit containing such allele-specific oligonucleotides, and 
the use of such sequences. 



27. Claims: 1-26 (partial) 
INVENTION 27: 

SEQ ID NOs: 457 to 460 (UMC66 gene/marker from maize), an 
allele-specific oligonucleotide hybridizing to these 
sequences or their complements, a method of analyzing them, 
a kit containing such allele-specific oligonucleotides, and 
the use of such sequences. 
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28. Claims: 1-26 (partial) 
INVENTION 28: 

SEQ ID NOs: 461 to 464 (Adh2 gene/marker from maize), an 
allele-specific oligonucleotide hybridizing to these 
sequences or their complements,* a method of analyzing them, 
a kit containing such allele-specific oligonucleotides, and 
the use of such sequences. 



29. Claims: 1-26 (partial) 
INVENTION 29: 

SEQ ID NOs: 465 to 482 (UHC63 gene/marker from maize), an 
allele-specific oligonucleotide hybridizing to these 
sequences or their complements, a method of analyzing them, 
a kit containing such allele-specific oligonucleotides, and 
the use of such sequences. 



30. Claims: 1-26 (partial) 
INVENTION 30: 

SEQ ID NOs: 483 to 486 (UMC102 gene/marker from maize), an 
allele-specific oligonucleotide hybridizing to these 
sequences or their complements, a method of analyzing them, 
a kit containing such allele-specific oligonucleotides, and 
the use of such sequences. 



31. Claims: 1-26 (partial) 
INVENTION 31: 

SEQ ID NOs: 487 to 490 (ASG24 gene/marker from maize), an 
allele-specific oligonucleotide hybridizing to these 
sequences or their complements, a method of analyzing them, 
a kit containing such allele-specific oligonucleotides, and 
the use of such sequences. 



32. Claims: 1-26 (partial) 
INVENTION 32: 

SEQ ID NOs: 491 to 522 (UMC49 gene/marker from maize), an 
allele-specific oligonucleotide hybridizing to these 
sequences or their complements, a method of analyzing them, 
a kit containing such allele-specific oligonucleotides, and 
the use of such sequences. 



33. Claims: 1-26 (partial) 
INVENTION 33: 

SEQ ID NOs: 523 to 534 (UMC131 gene/marker from maize), an 
allele-specific oligonucleotide hybridizing to these 
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sequences or their complements, a method of analyzing them, 




a kit containing such al lele-specific oligonucleotides, and 




the use of such sequences. 


34. 


Claims: 1-26 (partial) 




INVENTION 34: 




SEQ ID NOs: 535 to 552 {UMC53 gene/marker from maize), an 




al lele-specific oligonucleotide hybridizing to these 




sequences or their complements, a method of analyzing them, 




a kit containing such allele-specific oligonucleotides, and 




the use of such sequences. 


35. 


Claims: 1-26 (partial) 




INVENTION 35: 




SEQ ID NOs: 553 to 558 (UMC161 gene/marker from maize), an 




allele-specific oligonucleotide hybridizing to these 




sequences or their complements, a method of analyzing them. 




a kit containing such allele-specific oligonucleotides, and 




the use of such sequences. 


36. 


Claims: 1-26 (partial) 




INVENTION 36: 




SEQ ID NOs: 559 and 560 (UMC107 gene/marker from maize), an 




allele-specific oligonucleotide hybridizing to these 




sequences or their complements, a method of analyzing them, 




a kit containing such allele-specific oligonucleotides, and 




the use of such sequences. 


37. 


Claims: 1-26 (partial) 




INVENTION 37: 




SEQ ID NOs: 561 to 564 (UMC67 gene/marker from maize), an 




allele-specific oligonucleotide hybridizing to these 




sequences or their complements, a method of analyzing them, 




a kit containing such allele-specific oligonucleotides, and 




the use of such sequences. 


38. 


Claims: 1-26 (partial) 




INVENTION 38: 




SEQ ID NOs: 565 to 590 (UMC76 gene/marker from maize), an 




allele-specific oligonucleotide hybridizing to these 




sequences or their complements, a method of analyzing them, 




a kit containing such allele-specific oligonucleotides, and 




the use of such sequences. 
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